Skip to content

[webserver] Accept user-provided tags on the report_asset_materialization and report_asset_observation REST endpoints#33919

Open
raghav-reglobe wants to merge 1 commit into
dagster-io:masterfrom
raghav-reglobe:report-asset-materialization-tags
Open

[webserver] Accept user-provided tags on the report_asset_materialization and report_asset_observation REST endpoints#33919
raghav-reglobe wants to merge 1 commit into
dagster-io:masterfrom
raghav-reglobe:report-asset-materialization-tags

Conversation

@raghav-reglobe

@raghav-reglobe raghav-reglobe commented Jun 10, 2026

Copy link
Copy Markdown

Summary & Motivation

DagsterInstance.report_runless_asset_event supports arbitrary event tags via AssetMaterialization(tags=...) / AssetObservation(tags=...) — but the /report_asset_materialization/ and /report_asset_observation/ REST endpoints accept only their allowlisted params and silently drop everything else. External writers in non-Python languages (JVM/Go services reporting events for external assets over REST) therefore cannot attach data-version provenance tags such as dagster/input_data_version/<upstream> / dagster/code_version, which are exactly what makes externally-materialized assets participate in data-version/staleness machinery.

This adds an optional tags parameter to both endpoints:

  • JSON body object, or json-encoded query param — mirroring the existing metadata handling (including the 400 on parse failure), plus a 400 when tags is not a json object.
  • No new validation surface: tags flow into the existing event construction, where validate_asset_event_tags already exempts system asset-event tags and strict-validates the rest; failures surface through the endpoints' existing 400 path.
  • The dedicated data_version param takes precedence over a conflicting dagster/data_version tag (user tags merge first, param applies after).
  • ReportAssetMatParam / ReportAssetObsParam gain tags; the materialization API-consistency test's KNOWN_DIFF documents that Pipes does not take tags (same as partition/description).
  • Drive-by: fixes a copy-paste typo in the observation handler's construction-error message (it said "Error constructing AssetMaterialization").

Asset checks are intentionally left out — ReportAssetCheckEvalParam has a different shape (severity/passed) and AssetCheckEvaluation has no equivalent tags concept.

Context: we currently run the materialization half as a small runtime patch in production (JVM relays report Iceberg-commit materializations for ~16K external assets with input_data_version tags); upstreaming removes the need to carry it.

Test Plan

Extended dagster_webserver_tests/webserver/test_asset_events.py:

  • materialization: tags via json body (system dagster/input_data_version/... + custom key), tags via json-encoded query param, data_version param precedence over a conflicting tag, 400 on non-json query param, 400 on non-object body tags
  • observation: tags via json body + data_version precedence, 400 on non-object tags
  • both API-consistency tests updated (sample_payload + per-key validation; materialization KNOWN_DIFF)

All 10 tests in the file pass locally.

Changelog

The /report_asset_materialization/ and /report_asset_observation/ REST endpoints now accept an optional tags parameter (json object), allowing runless asset events reported over REST to carry event tags such as data-version provenance — matching the existing Python SDK capability.

🤖 Generated with Claude Code

@raghav-reglobe raghav-reglobe force-pushed the report-asset-materialization-tags branch from 8e7cf3a to d1d691e Compare June 10, 2026 15:27
@raghav-reglobe raghav-reglobe changed the title [webserver] Accept user-provided tags on the report_asset_materialization REST endpoint [webserver] Accept user-provided tags on the report_asset_materialization and report_asset_observation REST endpoints Jun 10, 2026
@raghav-reglobe raghav-reglobe force-pushed the report-asset-materialization-tags branch from d1d691e to fd9ef87 Compare June 23, 2026 04:16
@raghav-reglobe raghav-reglobe marked this pull request as ready for review June 23, 2026 04:34
@greptile-apps

greptile-apps Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds an optional tags parameter (JSON object) to the /report_asset_materialization/ and /report_asset_observation/ REST endpoints, mirroring the existing metadata parameter pattern (JSON body or JSON-encoded query param, 400 on parse or type errors). It also fixes a copy-paste typo in the observation handler's error message (AssetMaterializationAssetObservation).

  • Tags are merged into the event before the dedicated data_version param, ensuring data_version always wins on key conflict; flow through the existing validate_asset_event_tags validation path at construction time.
  • ReportAssetMatParam and ReportAssetObsParam gain a tags class attribute; the materialization API-consistency test's KNOWN_DIFF is updated to document that PipesContext.report_asset_materialization does not yet expose tags.
  • Tests cover JSON-body tags, JSON-encoded query-param tags, data_version precedence, and 400 error paths for both endpoints.

Confidence Score: 5/5

Safe to merge — the change is additive, follows existing patterns precisely, and is well-tested.

The tags parameter is wired in identically to the existing metadata parameter (body-first, then JSON-encoded query param, dict type check, 400 on failure). User tags are merged before the data_version param so precedence is deterministic and documented. Validation flows through the pre-existing validate_asset_event_tags path at construction time. The typo fix in the observation error message is straightforward. No edge cases were found that are not already covered by the existing or new tests.

No files require special attention.

Important Files Changed

Filename Overview
python_modules/dagster-webserver/dagster_webserver/external_assets.py Adds optional tags parameter to materialization and observation REST handlers using a pattern consistent with the existing metadata handling; also fixes a copy-paste typo in the observation error message.
python_modules/dagster-webserver/dagster_webserver_tests/webserver/test_asset_events.py Extends materialization and observation endpoint tests to cover user-provided tags via JSON body, JSON-encoded query params, data_version precedence, and 400 error cases; updates API consistency tests to include tags in sample payloads and KNOWN_DIFF.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[POST /report_asset_materialization/ or /report_asset_observation/] --> B{tags in json_body?}
    B -- Yes --> C[user_tags = json_body tags]
    B -- No --> D{tags in query_params?}
    D -- Yes --> E[json.loads query_param tags]
    E -- parse error --> F[400: Error parsing tags json]
    E -- OK --> C
    D -- No --> G[user_tags = None]
    C --> H{isinstance user_tags dict?}
    H -- No --> I[400: Expected tags to be a json object]
    H -- Yes --> J[tags = get_reporting_user_tags]
    J --> K[tags.update user_tags]
    G --> J2[tags = get_reporting_user_tags]
    K --> L{data_version param set?}
    J2 --> L
    L -- Yes --> M[tags overwrite DATA_VERSION_TAG and IS_USER_PROVIDED_TAG]
    L -- No --> N[Construct AssetMaterialization / AssetObservation with final tags]
    M --> N
    N -- construction error --> O[400: Error constructing ...]
    N -- OK --> P[report_runless_asset_event]
    P --> Q[200 OK]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[POST /report_asset_materialization/ or /report_asset_observation/] --> B{tags in json_body?}
    B -- Yes --> C[user_tags = json_body tags]
    B -- No --> D{tags in query_params?}
    D -- Yes --> E[json.loads query_param tags]
    E -- parse error --> F[400: Error parsing tags json]
    E -- OK --> C
    D -- No --> G[user_tags = None]
    C --> H{isinstance user_tags dict?}
    H -- No --> I[400: Expected tags to be a json object]
    H -- Yes --> J[tags = get_reporting_user_tags]
    J --> K[tags.update user_tags]
    G --> J2[tags = get_reporting_user_tags]
    K --> L{data_version param set?}
    J2 --> L
    L -- Yes --> M[tags overwrite DATA_VERSION_TAG and IS_USER_PROVIDED_TAG]
    L -- No --> N[Construct AssetMaterialization / AssetObservation with final tags]
    M --> N
    N -- construction error --> O[400: Error constructing ...]
    N -- OK --> P[report_runless_asset_event]
    P --> Q[200 OK]
Loading

Reviews (2): Last reviewed commit: "[webserver] accept user-provided tags on..." | Re-trigger Greptile

Comment on lines 393 to +422
obs = _assert_stored_obs(instance, my_asset_key)
assert obs.data_version == "fresh"

# user-provided tags (json body); dedicated data_version param wins over a conflicting tag
response = test_client.post(
f"/report_asset_observation/{my_asset_key}",
json={
"data_version": "param_wins",
"tags": {
"dagster/input_data_version/upstream/key": "12345",
"my_tag": "my_value",
DATA_VERSION_TAG: "tag_loses",
},
},
)
assert response.status_code == 200, response.json()
obs = _assert_stored_obs(instance, my_asset_key)
tags = obs.tags
assert tags
assert tags["dagster/input_data_version/upstream/key"] == "12345"
assert tags["my_tag"] == "my_value"
assert tags[DATA_VERSION_TAG] == "param_wins"

# bad tags: not an object
response = test_client.post(
f"/report_asset_observation/{my_asset_key}",
json={"tags": "im_just_a_string"},
)
assert response.status_code == 400
assert "Expected tags to be a json object" in response.json()["error"]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing query-param tags coverage for observation

The materialization tests exercise both the JSON-body path and the JSON-encoded query-param path for tags, but the observation tests only cover the JSON-body path. The handler code at lines 330–339 of external_assets.py does include the elif ReportAssetObsParam.tags in request.query_params branch for observation — a quick params={"tags": json.dumps(...)} case analogous to the materialization test would confirm that path is wired correctly.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

… + report_asset_observation

The Python SDK supports arbitrary tags on runless asset events
(AssetMaterialization/AssetObservation tags=...), but the REST endpoints
silently drop any field outside their allowlists — so external (non-Python)
writers reporting events over REST cannot attach data-version provenance
tags like dagster/input_data_version/<upstream>. This adds an optional
'tags' param to both endpoints (json body, or json-encoded query param
mirroring 'metadata' handling). Validation is unchanged: tags flow into the
existing event construction, where validate_asset_event_tags already exempts
system asset event tags and strict-validates the rest, surfacing errors via
the existing 400 path. The dedicated data_version param takes precedence
over a conflicting dagster/data_version tag. Also fixes a copy-paste typo in
the observation handler's construction-error message (said
AssetMaterialization).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant